Skip to content

Conversation

@ChinmayBansal
Copy link
Contributor

Related Issues

Proposed Changes:

Added filter-based bulk operations to MongoDBAtlasDocumentStore to support production RAG pipelines (parent issue #8508):

  • delete_by_filter(filters): Deletes all documents matching the provided Haystack metadata filters using MongoDB's delete_many() API. Returns the count of deleted documents.

  • update_by_filter(filters, meta): Updates metadata fields for all documents matching the provided filters using MongoDB's update_many() with $set operator. Updates fields in the meta.{key} path since MongoDB stores documents with flatten=False. Returns the count of modified documents.

Both methods include async versions (delete_by_filter_async and update_by_filter_async) and use the existing _normalize_filters() function for consistent filter handling across the document store.

How did you test it?

  • Added integration tests for both sync and async versions:

    • test_delete_by_filter: Verifies selective deletion based on metadata filters
    • test_update_by_filter: Verifies metadata updates for filtered documents
    • test_delete_by_filter_async: Async version of delete test
    • test_update_by_filter_async: Async version of update test
  • All tests validate:

    • Correct count of affected documents
    • Only matching documents are modified
    • Non-matching documents remain unchanged
    • Updated metadata persists correctly

Notes for the reviewer

  • Implementation follows the same pattern as the merged OpenSearch PR feat: add delete by filter and update by filer to OpenSearchDocumentStore #2407
  • MongoDB has immediate consistency, so no time.sleep() needed unlike OpenSearch
  • Metadata updates use meta.{key} path because MongoDB stores documents with flatten=False (line 737 in document_store.py)
  • Uses native MongoDB deleted_count and modified_count attributes from operation results

Checklist

@ChinmayBansal ChinmayBansal requested a review from a team as a code owner October 28, 2025 06:02
@ChinmayBansal ChinmayBansal requested review from mpangrazzi and removed request for a team October 28, 2025 06:02
@github-actions github-actions bot added integration:mongodb-atlas type:documentation Improvements or additions to documentation labels Oct 28, 2025
@ChinmayBansal ChinmayBansal changed the title add filter methods to MongoDB DocumentStore feat: add filter methods to MongoDB DocumentStore Oct 29, 2025
@mpangrazzi
Copy link
Contributor

@ChinmayBansal Thanks! Looks almost ok, but I see there are some conflicts to solve. Would you like to take care of them?

@ChinmayBansal
Copy link
Contributor Author

@mpangrazzi I will take care of them.

Resolved conflicts by keeping both sets of methods:
- Filter methods (delete_by_filter, update_by_filter) from this branch
- delete_all_documents methods from upstream

All methods now coexist in the document store.
@ChinmayBansal
Copy link
Contributor Author

@mpangrazzi I have fixed the merge conflicts, could you review?

Copy link
Contributor

@mpangrazzi mpangrazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@mpangrazzi mpangrazzi merged commit be22069 into deepset-ai:main Nov 6, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:mongodb-atlas type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add update_by_filter() and delete_by_filter() operations to MongoDBDocumentStore

2 participants